PEL: Position-Enhanced Length Filter for Set Similarity Joins
نویسندگان
چکیده
Set similarity joins compute all pairs of similar sets from two collections of sets. Set similarity joins are typically implemented in a filter-verify framework: a filter generates candidate pairs, possibly including false positives, which must be verified to produce the final join result. Good filters produce a small number of false positives, while they reduce the time they spend on hopeless candidates. The best known algorithms generate candidates using the so-called prefix filter in conjunction with lengthand position-based filters. In this paper we show that the potential of length and position have only partially been leveraged. We propose a new filter, the position-enhanced length filter, which exploits the matching position to incrementally tighten the length filter; our filter identifies hopeless candidates and avoids processing them. The filter is very efficient, requires no change in the data structures of most prefix filter algorithms, and is particularly effective for foreign joins, i.e., joins between two different collections of sets.
منابع مشابه
An Empirical Evaluation of Set Similarity Join Techniques
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct extensive experiments on seven state-of-the-art algorithms for set similarity joins. These algorithms adopt a filter-verification approach. Our analysis shows that verification has not received enough attention in previous works. In practice, efficient verification inspects only a small, constant num...
متن کاملFixed-point FPGA Implementation of a Kalman Filter for Range and Velocity Estimation of Moving Targets
Tracking filters are extensively used within object tracking systems in order to provide consecutive smooth estimations of position and velocity of the object with minimum error. Namely, Kalman filter and its numerous variants are widely known as simple yet effective linear tracking filters in many diverse applications. In this paper, an effective method is proposed for designing and implementa...
متن کامل3D motion vector coding with block base adaptive interpolation filter on H.264
Fractional pel motion compensation generally improves coding efficiency due to more precise motion accuracy and low path filtering effect in generating image at fractional pel positions. In H.264, quarter pel motion compensation is applied, where image at half pel position is generated by 6 tap Wiener filter. And the adaptive interpolation filter technique, which adaptively changes filter chara...
متن کاملBitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations
The Exact Set Similarity Join problem aims to find all similar sets between two collections of sets, with respect to a threshold and a similarity function such as overlap, Jaccard, dice or cosine. The näıve approach verifies all pairs of sets and it is often considered impractical due the high number of combinations. So, Exact Set Similarity Join algorithms are usually based on the Filter-Verif...
متن کاملSimilarity Joins in Relational Database Systems
State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. is book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify t...
متن کامل